This exercise will introduce five {ggplot2} extension packages:
{gghighlight}, {ggdist},
{ggridges}, {patchwork},
{cowplot}
In this Mini-Project we will be looking at some R packages that
augment plots created using {ggplot2}. One of the great
things about learning {ggplot2} is that many packages
create visualizations that are {ggplot2} objects. This
means they can be augmented and enhanced using these helper
packages.
In this mini project we’ll meet {gghighlight}, {ggdist}, {ggridges}, {patchwork} and {cowplot} but there are many others to explore.
This dataset is the Analysis Lab Chemistry data from CDISC that was
created for training/testing purposes. We are focussing on the Alanine
Aminotransferase (ALT) results. We are subsetting only the “Active”
phase of study for the ALT2 dataset. We are calculating a
new variable WEEK that defines the week within the Analysis
Active Treatment Period.
ALT <- import("https://github.com/phuse-org/phuse-scripts/raw/master/data/adam/cdisc/adlbc.xpt") %>%
filter(PARAMCD == "ALT")
ALT2 <- ALT %>%
filter(VISITNUM > 3) %>%
mutate(WEEK = floor(ADY/7))
TREATfac <- ALT2 %>%
select(TRTA, TRTAN) %>%
unique() %>%
arrange(TRTAN) %>%
mutate(TREATMENT = factor(TRTA, ordered = TRUE))
ALT2 <- ALT2 %>%
mutate(TREATTXT = factor(TRTP, levels = TREATfac$TRTA))
uniqueVal <- function(x){
if(length(unique(x))>1) simpleWarning(paste0("More than one value in column:",x))
unique(x)
}
First, let’s show a spaghetti plot of ALT measurements over time
(profiles for each subject) and show the Normal Range. Using what we
learned in the last mini project, fill in the mapping
argument for the ggplot and geom_line
statement. We’ll use ADY for the x axis and
LBSTRESN for the y axis. In the geom_line
we’ll want to connect all values for a given subject by using the
group attribute.
We’re also going to use the A1LO and A1HI
columns to make a shaded region showing the normal range of values for
ALT. The alpha argument makes this shaded region more
transparent. We want to draw the shaded geom_ribbon area
once, (rather than superimposing the
A1LO and A1HI values from the dataset (which
are repeated for every subject). We use the function
uniqueVal which we defined earlier to pick out a single
value for the normal range minimum and maximum values.
ALTmin <- uniqueVal(ALT2$A1LO)[1]
ALTmax <- uniqueVal(ALT2$A1HI)[1]
plot1 <- ALT2 %>%
ggplot(mapping = aes(x=ADY, y=LBSTRESN )) +
geom_line(mapping = aes(group=USUBJID )) +
geom_ribbon(mapping = aes(ymin = ALTmin, ymax = ALTmax), fill = "green", alpha = 0.2)
plot1
It’s often useful to do this first plot to see what the data look like before we “tidy” it. Note that there are some VERY high values. Rather than omit these points, we just want to “zoom in” on a narrower range for the y-axis. We introduced this in the last Mini-project.
plot1b <- plot1 +
coord_cartesian(ylim=c(0, 100), xlim = c(0, 125))
plot1b
{gghighlight}What would be useful is if we could highlight those subjects whose
lab results depart from the normal range. We can do this using the
{gghighlight} package. {gghighlight} takes an
existing plot object and highlights values in the plot that match a
given predicate (boolean expression). gghighlight will make
any values NOT matching the predicate fainter / grey so that the
remaining values are more obvious. If a group attribute is being used in
the plot then it will provide a label to help identify which group(s)
are being highlighted in the plot. Here because we have many subjects in
the data, it skips showing the label.
The predicate used in gghighlight here checks whether
each subject has any values for which the LBNRIND variable is
missing i.e. it is checking whether each subject has any values which
are marked either “HIGH” or “LOW”.
library(gghighlight)
plot1c <- plot1b +
gghighlight::gghighlight(any(LBNRIND == "HIGH" | LBNRIND == "LOW"))
## label_key: USUBJID
## Too many data series, skip labeling
plot1c
In order to use facetting with gghighlight you need to
set the argument calculate_per_facet to tell
gghighlight that you want to highlight values that meet the
predicate within each facet. Otherwise gghighlight
will show ALL observations within the facet.
plot1d <- plot1b +
facet_wrap( ~ TRTP) +
gghighlight::gghighlight(any(LBNRIND == "HIGH" | LBNRIND == "LOW"),
calculate_per_facet = TRUE)
## label_key: USUBJID
## Too many data series, skip labeling
plot1d
geom_boxplot to visualize distributionsBoxplots give a useful summary of the distribution of values. They show the quartiles, median and range of the data. The problem with boxplots is that they do not show the number of values. So in this next plot we are going to try to overlay the data points so that we can see both points and boxplot information.
Let’s look at the “Data Visualization with ggplot2” Cheat Sheet to
see what the options are for boxplots. Note that
geom_boxplot is in the section marked “Discrete x,
continuous y”. Since WEEK is a continuous (numeric) value in our data,
the boxplot is going to work best if we convert it to discrete values
using the function as.factor( ). Note that when we do this
the week values are equally spaced so that the boxplots on the left show
data 1 week apart, while the boxplots on the right show data that is 8
weeks apart. This is a limitation of treating WEEK as a
categorical variable.
ALT2 %>%
filter(WEEK %in% c(0, 5, 10, 15, 20, 25, 30)) %>%
ggplot() +
geom_boxplot(mapping = aes(x = as.factor(WEEK), y = LBSTRESN))
Now let’s add data on top of the boxplots. We can do this quite
easily using geom_point. Add mapping arguments
where needed in the code below.
ALT2 %>%
filter(WEEK %in% c(0, 5, 10, 15, 20, 25, 30)) %>%
ggplot( ) +
geom_boxplot(mapping = aes(x = as.factor(WEEK), y = LBSTRESN)) +
geom_point(mapping = aes(x = as.factor(WEEK), y = LBSTRESN))
It might be more useful to spread out the points a little so that we
can see individual points. We do this using geom_jitter. We
can also apply the alpha setting to the points to add
transparency so we can see the boxplots beneath. The width
option in geom_jitter controls the horizontal jittering of
the points. Smaller values will cluster them closer to the middle of the
boxplot. The default is to use the full width of the boxplot. Note that
in geom_boxplot we set the argument
outlier.shape = NA which will “turn off” showing the
outliers, since in this case we show them via geom_jitter
we don’t need to show them in the boxplot as
well.
plot2b <- ALT2 %>%
filter() %>%
ggplot(mapping = aes(x = as.factor(WEEK), y = LBSTRESN)) +
geom_boxplot( ) +
geom_jitter(alpha = 0.2, width = 0.1)
plot2b
Packages like {ggdist} and {ggridges} can
also be used to show the distribution of values. Each of these packages
create {ggplot2} objects, which can be annotated and used
like any other {ggplot2} object. Typically
{ggplot2} extensions add new geom_ or
stat_ functions.
Let’s use {ggdist} to show dotplots of the data. Because
{ggdist} shows the distribution of points, it’s often
easier to see these horizontally, rather than vertically. In this case
we have rotated the plot, putting the continuous ALT value on the x-axis
and the week value on the y-axis.
library(ggdist)
ALT2 %>%
filter(WEEK %in% c(0, 5, 10, 15, 20, 25, 30)) %>%
ggplot(mapping = aes(x = LBSTRESN, y = as.factor(WEEK))) +
coord_cartesian(xlim=c(0, 100)) +
stat_dotsinterval(slab_shape = 19, quantiles = 100)
Another option is to show the distribution as a smooth curve, rather
than dots. {ggdist} has a stat_slab geom to
help with this. The plot shown below is sometimes called a “raincloud
plot” for obvious reasons.
ALT2 %>%
filter(WEEK %in% c(0, 5, 10, 15, 20, 25, 30)) %>%
ggplot(mapping = aes(x = LBSTRESN, y = as.factor(WEEK), fill = as.factor(WEEK))) +
coord_cartesian(xlim=c(0, 75)) +
stat_slab() +
stat_dotsinterval(side = "bottom", scale = 0.5, slab_size = NA)
The {ggridges} package takes this presentation of
distributions and slightly overlays the distribution “ridges” to
minimise space. The plot allows us to quite readily compare between
distributions.
library(ggridges)
##
## Attaching package: 'ggridges'
## The following objects are masked from 'package:ggdist':
##
## scale_point_color_continuous, scale_point_color_discrete,
## scale_point_colour_continuous, scale_point_colour_discrete,
## scale_point_fill_continuous, scale_point_fill_discrete,
## scale_point_size_continuous
ALT2 %>%
filter(WEEK %in% c(0, 5, 10, 15, 20, 25, 30)) %>%
ggplot(mapping = aes(x = LBSTRESN, y = as.factor(WEEK))) +
coord_cartesian(xlim=c(0, 80)) +
geom_density_ridges()
## Picking joint bandwidth of 5.07
geom_bar to count thingsBar charts can be used to count the number of observations in the
data. Let’s use geom_bar to visualize the incidence of
“Low”, “Normal” and “High” ALT within each week of treatment.
Use the “Data visualization with ggplot2” Cheat Sheet (from the Help
menu in RStudio) to see what aesthetics can be specified for
geom_bar bar chart to show how many observations are in
each category of LBNRIND across WEEK.
ALT2 %>%
ggplot() +
geom_bar(mapping = aes(x = WEEK, fill= LBNRIND))
Most individuals have Lab measurements on weeks 2, 4, 6, 8, 16, 24, 26. Let’s filter the data to show JUST those weeks.
ALT2 %>%
filter(WEEK %in% c(2, 4, 6, 8, 16, 24, 26)) %>%
ggplot( ) +
geom_bar(mapping = aes(x = WEEK,
fill = LBNRIND))
Once again, we want to convert the WEEK variable from
continuous to categorical. As above, we can do that with
as.factor.
plot3 <- ALT2 %>%
filter(WEEK %in% c(2, 4, 6, 8, 16, 24, 26)) %>%
ggplot( ) +
geom_bar(mapping = aes(x = as.factor(WEEK),
fill = LBNRIND))
plot3
In a stacked bar chart it’s hard to see exactly how the LOW and HIGH
groups change over time as the height of their values is much smaller
than the NORMAL observations. We can place the LOW and HIGH separately
on the x-axis using the option position = dodge. Note that
we’re also turning LBNRIND into a factor ordered “LOW” -> “NORMAL”
-> “HIGH”. This is because default sorting of character columns is
alphanumeric.
plot3b <- ALT2 %>%
filter(WEEK %in% c(2, 4, 6, 8, 16, 24, 26)) %>%
mutate(OUTRANGF = factor(LBNRIND,
levels = c("LOW","NORMAL","HIGH"))) %>%
ggplot() +
geom_bar(mapping = aes(x = as.factor(WEEK),
fill = OUTRANGF),
position = "dodge")
plot3b
As we did in the last mini project, we can now take plots
plot1b and plot2b and apply labels to tidy up
axes labels using the labs function.
plot1e <- plot1b +
labs(x = "Relative Days to Treatment",
y = "Lab Results in Standard Units")
plot1e
plot2c <- plot2b +
labs(x = "Week", y = "Lab Results in Standard Units")
plot2c
plot3c <- plot3b +
labs(x = "Week",
y = "Frequency")
plot3c
{patchwork}The {patchwork} package allows you to quickly and easily
combine plots. To combine two plots into one, you simply “add” using the
“+” operator.
You can find out more on how to specify layouts from the
{patchwork} documentation.
library(patchwork)
p1 <- plot1e | plot2c
p1
Try out other layouts to combine the three graphs:
p2 <- (plot2c | plot3c) / plot1e
p2
You can then use these combined plots as one object, provide a title
for the combination etc. {patchwork} has its own function
plot_annotation that gives similar functionality to the
labs function in {ggplot2} but annotates the
combined plot object. Use what you learned in mini project 5 to add
titles and subtitles to the combined plot.
plot_annotation also allows you to designate a pattern
for identifying sub-figures using the tag_levels. Here
we’re specifying the first plot with a lower case letter a
and the second will follow the pattern to show b. We’re
also using tag_suffix to put a round bracket after the tag.
Try the code below and then see what happens if you specify the
tag_levels = "i" instead of “a”.
p2b <- p2 + plot_annotation(
title = ,
subtitle = ,
tag_levels="a", tag_suffix = ")")
p2b
{cowplot}The {cowplot} package from Claus O. Wilke (hence COW)
has a number of very useful additional functions that work on
{ggplot2} plot objects. Claus’ book “Fundamentals of Data
Visualization” is a really excellent book about creating high
quality data visualizations and I heartily recommend that you read it.
The package turns recommendations from the book into functions that you
can apply in R.
In the code below we are going to add a watermark to the plot we created above.
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:patchwork':
##
## align_plots
## The following object is masked from 'package:lubridate':
##
## stamp
ggdraw(p2b) +
draw_label("DRAFT", color = "grey", alpha=0.3, size = 100, angle = 45)
Using what you’ve learned in this Mini Project 6, please try to create a graph similar to the one attached. Red points show ALT values that are outside the range i.e. where OUTRANGT is “High” or “Low”. You may need to refer back to Mini Project 5 to complete this challenge.
ALT3 <- ALT2 %>%
filter(WEEK %in% c(0, 1, 2, 4, 8, 16)) %>%
filter(LBNRIND %in% c("HIGH", "LOW"))
plot4 <-
ALT2 %>%
filter(WEEK %in% c(0, 1, 2, 4, 8, 16)) %>%
ggplot(mapping = aes(x = LBSTRESN, y = as.factor(WEEK))) +
coord_cartesian(xlim=c(0, 150)) +
stat_dotsinterval(slab_alpha = 0) +
geom_point(data = ALT3, color = "red") +
labs(x = "Alanine Aminotransferase U/L",
y = "Week")
ggdraw(plot4) +
draw_label("DRAFT", color = "grey", alpha=0.3, size = 75, angle = 45)
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.4.3 (2025-02-28)
## os Ubuntu 20.04.6 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate C.UTF-8
## ctype C.UTF-8
## tz UTC
## date 2025-06-03
## pandoc 3.1.11 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
## quarto 1.7.29 @ /usr/bin/quarto
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## bslib 0.6.1 2023-11-28 [1] RSPM (R 4.4.2)
## cachem 1.0.8 2023-05-01 [1] RSPM (R 4.4.2)
## cli 3.6.2 2023-12-11 [1] RSPM (R 4.4.2)
## colorspace 2.1-0 2023-01-23 [1] RSPM (R 4.4.2)
## cowplot * 1.1.3 2024-01-22 [1] RSPM (R 4.4.2)
## curl 5.2.0 2023-12-08 [1] RSPM (R 4.4.2)
## digest 0.6.34 2024-01-11 [1] RSPM (R 4.4.2)
## distributional 0.4.0 2024-02-07 [1] RSPM (R 4.4.2)
## dplyr * 1.1.4 2023-11-17 [1] RSPM (R 4.4.2)
## evaluate 0.23 2023-11-01 [1] RSPM (R 4.4.2)
## fansi 1.0.6 2023-12-08 [1] RSPM (R 4.4.2)
## farver 2.1.1 2022-07-06 [1] RSPM (R 4.4.2)
## fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.4.2)
## forcats * 1.0.0 2023-01-29 [1] RSPM (R 4.4.2)
## generics 0.1.3 2022-07-05 [1] RSPM (R 4.4.2)
## ggdist * 3.3.1 2023-11-27 [1] RSPM (R 4.4.2)
## gghighlight * 0.4.1 2023-12-16 [1] RSPM (R 4.4.2)
## ggplot2 * 3.5.0 2024-02-23 [1] RSPM (R 4.4.2)
## ggridges * 0.5.6 2024-01-23 [1] RSPM (R 4.4.2)
## glue * 1.7.0 2024-01-09 [1] RSPM (R 4.4.2)
## gtable 0.3.4 2023-08-21 [1] RSPM (R 4.4.2)
## haven 2.5.4 2023-11-30 [1] RSPM (R 4.4.2)
## highr 0.10 2022-12-22 [1] RSPM (R 4.4.2)
## hms 1.1.3 2023-03-21 [1] RSPM (R 4.4.2)
## htmltools 0.5.7 2023-11-03 [1] RSPM (R 4.4.2)
## jquerylib 0.1.4 2021-04-26 [1] RSPM (R 4.4.2)
## jsonlite 1.8.8 2023-12-04 [1] RSPM (R 4.4.2)
## knitr 1.45 2023-10-30 [1] RSPM (R 4.4.2)
## labeling 0.4.3 2023-08-29 [1] RSPM (R 4.4.2)
## lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.2)
## lubridate * 1.9.3 2023-09-27 [1] RSPM (R 4.4.2)
## magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.2)
## munsell 0.5.0 2018-06-12 [1] RSPM (R 4.4.2)
## patchwork * 1.2.0 2024-01-08 [1] RSPM (R 4.4.2)
## pillar 1.9.0 2023-03-22 [1] RSPM (R 4.4.2)
## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.2)
## purrr * 1.0.2 2023-08-10 [1] RSPM (R 4.4.2)
## quadprog 1.5-8 2019-11-20 [1] RSPM (R 4.4.2)
## R.methodsS3 1.8.2 2022-06-13 [1] RSPM (R 4.4.2)
## R.oo 1.26.0 2024-01-24 [1] RSPM (R 4.4.2)
## R.utils 2.12.3 2023-11-18 [1] RSPM (R 4.4.2)
## R6 2.5.1 2021-08-19 [1] RSPM (R 4.4.2)
## Rcpp 1.0.12 2024-01-09 [1] RSPM (R 4.4.2)
## readr * 2.1.5 2024-01-10 [1] RSPM (R 4.4.2)
## rio * 1.0.1 2023-09-19 [1] RSPM (R 4.4.2)
## rlang 1.1.3 2024-01-10 [1] RSPM (R 4.4.2)
## rmarkdown 2.25 2023-09-18 [1] RSPM (R 4.4.2)
## rstudioapi 0.15.0 2023-07-07 [1] RSPM (R 4.4.2)
## sass 0.4.8 2023-12-06 [1] RSPM (R 4.4.2)
## scales 1.3.0 2023-11-28 [1] RSPM (R 4.4.2)
## sessioninfo 1.2.3 2025-02-05 [1] RSPM (R 4.4.0)
## stringi 1.8.3 2023-12-11 [1] RSPM (R 4.4.2)
## stringr * 1.5.1 2023-11-14 [1] RSPM (R 4.4.2)
## tibble * 3.2.1 2023-03-20 [1] RSPM (R 4.4.2)
## tidyr * 1.3.1 2024-01-24 [1] RSPM (R 4.4.2)
## tidyselect 1.2.0 2022-10-10 [1] RSPM (R 4.4.2)
## tidyverse * 2.0.0 2023-02-22 [1] RSPM (R 4.4.2)
## timechange 0.3.0 2024-01-18 [1] RSPM (R 4.4.2)
## tzdb 0.4.0 2023-05-12 [1] RSPM (R 4.4.2)
## utf8 1.2.4 2023-10-22 [1] RSPM (R 4.4.2)
## vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.2)
## withr 3.0.0 2024-01-16 [1] RSPM (R 4.4.2)
## xfun 0.42 2024-02-08 [1] RSPM (R 4.4.2)
## yaml 2.3.8 2023-12-11 [1] RSPM (R 4.4.2)
##
## [1] /cloud/lib/x86_64-pc-linux-gnu-library/4.4
## [2] /opt/R/4.4.3/lib/R/library
## * ── Packages attached to the search path.
##
## ──────────────────────────────────────────────────────────────────────────────